{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Manually Training Vanna\n",
    "This notebook shows how to manually train Vanna. If you prefer to automatically train Vanna, see [here](vn-train.html)\n",
    "\n",
    "# Install Vanna\n",
    "First we install Vanna from [PyPI](https://pypi.org/project/vanna/) and import it.\n",
    "Here, we'll also install the Snowflake connector. If you're using a different database, you'll need to install the appropriate connector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install vanna"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import vanna as vn"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Login\n",
    "Creating a login and getting an API key is as easy as entering your email (after you run this cell) and entering the code we send to you. Check your Spam folder if you don't see the code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "api_key = vn.get_api_key('my-email@example.com')\n",
    "vn.set_api_key(api_key)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set your Model\n",
    "You need to choose a globally unique model name. Try using your company name or another unique string. All data from models are isolated - there's no leakage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "vn.set_model('my-model') # Enter your model name here. This is a globally unique identifier for your model."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train with DDL Statements\n",
    "If you prefer to manually train, you do not need to connect to a database. You can use the train function with other parmaeters like ddl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vn.train(ddl=\"\"\"\n",
    "    CREATE TABLE IF NOT EXISTS my-table (\n",
    "        id INT PRIMARY KEY,\n",
    "        name VARCHAR(100),\n",
    "        age INT\n",
    "    )\n",
    "\"\"\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train with Documentation\n",
    "Sometimes you may want to add documentation about your business terminology or definitions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vn.train(documentation=\"Our business defines OTIF score as the percentage of orders that are delivered on time and in full\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train with SQL\n",
    "You can also add SQL queries to your training data. This is useful if you have some queries already laying around. You can just copy and paste those from your editor to begin generating new SQL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vn.train(sql=\"SELECT * FROM my-table WHERE name = 'John Doe'\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# View Training Data\n",
    "At any time you can see what training data is in your model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>training_data_type</th>\n",
       "      <th>question</th>\n",
       "      <th>content</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>15-doc</td>\n",
       "      <td>documentation</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a table in the PARTSUPP table.\\n\\nThe ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>11-doc</td>\n",
       "      <td>documentation</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a table in the CUSTOMER table.\\n\\nThe ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>14-doc</td>\n",
       "      <td>documentation</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a table in the ORDERS table.\\n\\nThe fo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1244-sql</td>\n",
       "      <td>sql</td>\n",
       "      <td>What are the names of the top 10 customers?</td>\n",
       "      <td>SELECT c.c_name as customer_name\\nFROM   snowf...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1242-sql</td>\n",
       "      <td>sql</td>\n",
       "      <td>What are the top 5 customers in terms of total...</td>\n",
       "      <td>SELECT c.c_name AS customer_name, SUM(l.l_quan...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>17-doc</td>\n",
       "      <td>documentation</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a table in the REGION table.\\n\\nThe fo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>16-doc</td>\n",
       "      <td>documentation</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a table in the PART table.\\n\\nThe foll...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1243-sql</td>\n",
       "      <td>sql</td>\n",
       "      <td>What are the top 10 customers with the highest...</td>\n",
       "      <td>SELECT c.c_name as customer_name,\\n       sum(...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1239-sql</td>\n",
       "      <td>sql</td>\n",
       "      <td>What are the top 100 customers based on their ...</td>\n",
       "      <td>SELECT c.c_name as customer_name,\\n       sum(...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>13-doc</td>\n",
       "      <td>documentation</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a table in the SUPPLIER table.\\n\\nThe ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1241-sql</td>\n",
       "      <td>sql</td>\n",
       "      <td>What are the top 10 customers in terms of tota...</td>\n",
       "      <td>SELECT c.c_name as customer_name,\\n       sum(...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>12-doc</td>\n",
       "      <td>documentation</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a table in the LINEITEM table.\\n\\nThe ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>18-doc</td>\n",
       "      <td>documentation</td>\n",
       "      <td>None</td>\n",
       "      <td>This is a table in the NATION table.\\n\\nThe fo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1248-sql</td>\n",
       "      <td>sql</td>\n",
       "      <td>How many customers are in each country?</td>\n",
       "      <td>SELECT n.n_name as country,\\n       count(*) a...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1240-sql</td>\n",
       "      <td>sql</td>\n",
       "      <td>What is the number of orders placed each week?</td>\n",
       "      <td>SELECT date_trunc('week', o_orderdate) as week...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          id training_data_type  \\\n",
       "0     15-doc      documentation   \n",
       "1     11-doc      documentation   \n",
       "2     14-doc      documentation   \n",
       "3   1244-sql                sql   \n",
       "4   1242-sql                sql   \n",
       "5     17-doc      documentation   \n",
       "6     16-doc      documentation   \n",
       "7   1243-sql                sql   \n",
       "8   1239-sql                sql   \n",
       "9     13-doc      documentation   \n",
       "10  1241-sql                sql   \n",
       "11    12-doc      documentation   \n",
       "12    18-doc      documentation   \n",
       "13  1248-sql                sql   \n",
       "14  1240-sql                sql   \n",
       "\n",
       "                                             question  \\\n",
       "0                                                None   \n",
       "1                                                None   \n",
       "2                                                None   \n",
       "3         What are the names of the top 10 customers?   \n",
       "4   What are the top 5 customers in terms of total...   \n",
       "5                                                None   \n",
       "6                                                None   \n",
       "7   What are the top 10 customers with the highest...   \n",
       "8   What are the top 100 customers based on their ...   \n",
       "9                                                None   \n",
       "10  What are the top 10 customers in terms of tota...   \n",
       "11                                               None   \n",
       "12                                               None   \n",
       "13            How many customers are in each country?   \n",
       "14     What is the number of orders placed each week?   \n",
       "\n",
       "                                              content  \n",
       "0   This is a table in the PARTSUPP table.\\n\\nThe ...  \n",
       "1   This is a table in the CUSTOMER table.\\n\\nThe ...  \n",
       "2   This is a table in the ORDERS table.\\n\\nThe fo...  \n",
       "3   SELECT c.c_name as customer_name\\nFROM   snowf...  \n",
       "4   SELECT c.c_name AS customer_name, SUM(l.l_quan...  \n",
       "5   This is a table in the REGION table.\\n\\nThe fo...  \n",
       "6   This is a table in the PART table.\\n\\nThe foll...  \n",
       "7   SELECT c.c_name as customer_name,\\n       sum(...  \n",
       "8   SELECT c.c_name as customer_name,\\n       sum(...  \n",
       "9   This is a table in the SUPPLIER table.\\n\\nThe ...  \n",
       "10  SELECT c.c_name as customer_name,\\n       sum(...  \n",
       "11  This is a table in the LINEITEM table.\\n\\nThe ...  \n",
       "12  This is a table in the NATION table.\\n\\nThe fo...  \n",
       "13  SELECT n.n_name as country,\\n       count(*) a...  \n",
       "14  SELECT date_trunc('week', o_orderdate) as week...  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vn.get_training_data()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Removing Training Data\n",
    "If you added some training data by mistake, you can remove it. Model performance is directly linked to the quality of the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vn.remove_training_data(id='my-training-data-id')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Asking Questions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vn.ask()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
